翻訳と辞書
Words near each other
・ Statistical signal processing
・ Statistical significance
・ Statistical Society of Australia
・ Statistical Society of Canada
・ Statistical Solutions
・ Statistical static timing analysis
・ Statistical syllogism
・ Statistical theory
・ Statistical thinking
・ Statistical time division multiplexing
・ Statistical unit
・ Statistical weight
・ Statistical Yearbook of Switzerland
・ Statistical, Economic and Social Research and Training Centre for Islamic Countries
・ Statistically close
Statistically improbable phrase
・ Statistician
・ Statisticians in the Pharmaceutical Industry
・ Statisticians' and engineers' cross-reference of statistical terms
・ Statistics
・ Statistics (disambiguation)
・ Statistics (song)
・ Statistics Act
・ Statistics and Computing
・ Statistics and Its Interface
・ Statistics and Registration Service Act 2007
・ Statistics Belgium
・ Statistics Canada
・ Statistics Commission
・ Statistics Denmark


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Statistically improbable phrase : ウィキペディア英語版
Statistically improbable phrase
Statistically Improbable Phrases (SIPs) are words or phrases that occur more frequently in a document (or collection of documents) than in some larger corpus.〔http://courses.cms.caltech.edu/cs145/2011/wikipedia.pdf〕〔https://www.plagiarismtoday.com/2012/07/03/how-long-should-a-statistically-improbably-phrase-be/〕〔http://bioinformatics.oxfordjournals.org/content/26/11/1453.full〕
Amazon.com uses this concept in determining keywords for a given book or chapter, since keywords of a book or chapter are likely to appear disproportionately within that section.〔(【引用サイトリンク】title=What are Statistically Improbable Phrases? )Christian Rudder has also used this concept with data from online dating profiles and Twitter posts to determine the phrases most characteristic of a given race or gender in his book ''Dataclysm''.
== Example ==
In a document about computers, the most common word is likely to be the word "the", but since "the" is the most commonly used word in the English language, it is likely that any given document will have the word "the" used very frequently. However, a word like "program" might occur in the document at a much higher rate than its average rate in the English language. Hence, it is a word unlikely to occur in any given document, but ''did'' occur in the document given. "Program" would be a statistically improbable phrase.
The statistically improbable phrases of Darwin's ''On the Origin of Species'' are: ''temperate productions, genera descended, transitional gradations, unknown progenitor, fossiliferous formations, our domestic breeds, modified offspring, doubtful forms, closely allied forms, profitable variations, enormously remote, transitional grades, very distinct species'' and ''mongrel offspring''.〔(Sociologically Improbable Phrases ) Crooked Timber April 2005〕
==See also==

*Googlewhack — a pair of words occurring on a single webpage, as indexed by Google
*tf-idf — a statistic used in information retrieval and text mining.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Statistically improbable phrase」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.